Introduction

Starbucks’ gargantuan $26 billion of revenue in 2019 is resounding evidence that Americans, above everything, love their coffee. They prove it time and time again, most recently during the current pandemic: some, unable to relinquish their daily ‘bucks, waited hours on end in long drive-thru lines to get their fix when coffee shops were closed for in-person operations.

It is no surprise, then, that Starbucks owns more than 8,000 stores across the US, and continues to grow everyday. These sheer numbers are reflective of a company that is about more than just coffee, having a significant influence on US society and culture with (often controversial) initiatives such as removing religious references from holiday-themed cups. In addition, Starbucks is known for treating their employees extremely well by providing health coverage, tuition coverage, and 401(k) plans, which again reinforces their highly-regarded brand.

As such, the astounding popularity of the chain in the country and the prospect of further expansion raises important questions. What can the current distribution of Starbucks stores tell us about societal factors across the country? Which factors should the brand consider when expanding into new locations? How can companies like Starbucks, which pride themselves in a positive social and environmental outlook, incorporate such values into their corporate strategy - especially in the current pandemic?

Overview of Analysis

In order to answer these questions, we sought to understand the relationship between state-level social and economic factors, such as income and unemployment; and the distribution of Starbucks locations in the US. Furthermore, we wanted to understand what company behavior and decision-making could look like with a more social-minded approach, instead of considering sheer profits. Starbucks shops, after all, can be significant contributors to social good on a localized level by providing stable jobs that open up avenues for growth and education.

As such, in this project, we conduct a consulting case-study of Starbucks, analysing what insights are gained from the current distribution of stores in the US and providing socially-minded recommendations for new locations in which the brand can expand. We start this endeavor with a spatial analysis of Starbucks stores across the US, exploring the relationship between store density and income/unemployment level in each state. We then complement this analysis with a text-based sentiment analysis of recent tweets mentioning Starbucks, and explore how these sentiments vary across location. Finally, we use these variables to conduct a clustering analysis of states in the US, in order to determine the different “types” of states under which Starbucks operates. We proceed to use these clusters to identify the ideal locations for the brand to build new stores.

Data

In order to conduct this analysis, we collected the following data:

  • The Starbucks Locations Worldwide dataset available on Kaggle and provided by Starbucks Corporation, including a record for every Starbucks or subsidiary store location that was operation during February 2017.
  • The US Household Income Statistics dataset, available on Kaggle and provided by the Golden Oak Research group, containing information on US cities’ average income.
  • The US Unemployment Rate by County dataset avaiable on Kaggle and provided by the US Department of Labor’s Bureau of Labor Statistics, containing unemployment and population data at the US county level from 1990-2016.
  • The most recent 18,000 tweets containing the string “Starbucks” or “starbucks” in their text, scraped using the RTweet package to interface with the Twitter API.

Check the links in the bullets above for access to each of the data sources!

Distribution of Starbucks Across the US

TEXT EXPLAINING INTRODUCTION VISUAL OF STARBUCKS LOCATIONS Text explaining map, transition from current clusters of starbucks to clusters of potential locations.

General Process Towards Selecting New Location: 1. Find States 2. Find Counties

Clustering by State

Paragraph about how we went about determining clusters of states, and explaining why we want to use clusters and explaining factors we use

State-Level Factors

  • Consumer Sentiments
  • Average Income
  • Average Unemployment Rate
  • Number of Starbucks

Cosumer Sentiments Across US

Before including the consumer sentiments in the clustering analysis, it was important to understand this information independently. Our initial hypothesis was that states with larger densities of starbucks would tweet more positive and less negative opinions. However, when observing the plots below, states with more Starbucks, such as California, Texas, Florida, and New York, all have a lower proportion of positive words in tweets and high proportion of negative words in tweets compared to other states in the US. Additionally, the states with a lower amount of Starbucks, such as Wisconsin, Iowa, Kansas, and Mississippi, had a higher proportion of positive words and lower proportion of negative words in tweets. This leads to the conclusion that it may be more beneficial for Starbucks to build its stores in states it is not currently heavily established in, according to consumer sentiment analysis.

State-Level Clusters

Explain cluster analysis and elbow plot helped to determine 5 clusters were needed

Visualization of Cluster

So we chose 5 clusters based on this plot, and the 5 clusters are shown on this visualization:

Cluster Characteristics

Explain we can see characteristics of each cluster through this matrix. Explain each cluster’s unique properties

Final List of States

Then, we calculated the center within each center, finding the states within each cluster that best represented each cluster by finding the state with the miminal distance between each characterstic’s value and itself. From this analysis we concluded the final 5 states, each representative of their own cluster, are:

  • Virginia
  • Vermont
  • Tennessee
  • Iowa
  • California

Clustering by County

County-Level Factors

  • Average Income
  • Average Unemployment Rate
  • Number of Starbucks

Step-by Step Process

  1. Cluster Analysis
  2. Creating clusters based on appropriate number of centers
  3. Determine the most suitable cluster of counties

Determining Amount of Clusters for Each State

Virginia

Cluster Analysis

Cluster characteristics, explain why we chose cluster 4

Vermont

Cluster Analysis

Cluster characteristics, explain why we chose cluster 1

Tennessee

Cluster Analysis

Visualizing clusters

Cluster characteristics, explain why we chose cluster 4

Iowa

Cluster Analysis

Visualizing clusters

Cluster characteristics, explain why we chose cluster 5

California

Cluster Analysis

Visualizing clusters

Cluster characteristics, explain why we chose cluster 1

Final Recommendations

Summarize overall process and thinking and impact of starbucks. Show the data table of all important counties from clusters we chose:

County Avg. Income Num. Starbucks Avg. Unemployment Rate
Fresno 55523.57 36 12.326804
Glenn 53148.00 0 11.998625
Kern 49580.61 39 11.487972
Madera 49355.76 5 11.890378
Merced 49976.11 7 13.486942
Modoc 43340.33 0 10.394502
Monterey 68913.00 1 9.826804
Plumas 64924.83 0 11.346392
San Joaquin 71281.69 24 10.629553
Santa Cruz 82669.83 7 11.217886
Shasta 53369.18 10 9.644330
Stanislaus 48900.86 7 11.919244
Tehama 48808.75 1 9.508935
Tulare 45336.69 17 13.573540
Yuba 55171.50 0 12.325086
Adams 65864.50 0 6.569547
Appanoose 49003.50 0 5.678395
Benton 57531.00 0 6.154527
Buchanan 63804.00 0 7.088080
Butler 61836.00 0 5.995722
Calhoun 43333.00 0 7.597668
Carroll 55611.00 0 5.938277
Cherokee 55162.00 0 6.286043
Clarke 43619.50 0 7.021777
Clay 46965.00 0 6.565042
Clayton 54832.50 0 5.518210
Clinton 59001.00 0 6.144743
Crawford 51775.50 0 6.761908
Dallas 78098.67 0 7.271241
Delaware 55472.17 0 5.291101
Fayette 55887.25 0 6.797021
Franklin 50078.00 0 5.943186
Greene 58107.83 0 6.827746
Grundy 64870.50 0 6.497376
Hardin 58296.00 0 6.925958
Henry 51271.00 0 6.525775
Jackson 47535.00 0 6.540742
Jasper 59793.11 1 6.459512
Jefferson 56939.00 0 6.596809
Linn 73450.25 9 7.200540
Lyon 54713.00 0 5.473609
Marion 47388.00 0 7.014973
Marshall 60222.00 0 6.233670
Osceola 49948.00 0 5.801299
Pocahontas 50181.50 0 6.703086
Union 43619.50 0 6.206117
Warren 60983.96 7 5.924982
Washington 49947.33 0 5.938779
Webster 54712.17 0 6.443786
Chittenden 83283.50 0 3.387654
Brunswick 35873.00 0 7.469872
Cumberland 46083.00 0 6.302316
Grayson 32076.00 0 7.063253
Greensville 33192.67 0 5.819667
Halifax 29411.00 0 8.662019
Lee 31678.50 0 7.173886
Northampton 39718.25 0 6.739979
Prince Edward 54858.00 0 6.603667
Scott 46810.00 0 5.713129
Smyth 39226.00 0 8.147333
Carter 41428.75 0 6.840901
Chester 24221.00 0 7.115124
Fentress 39775.00 0 9.083333
Hardeman 26503.00 0 7.497459
Lawrence 45854.50 0 7.462405
Lewis 41536.00 0 8.101558
Macon 40533.00 0 7.106605
Monroe 48462.00 0 6.983337
Obion 45863.00 0 7.635802
Rhea 38409.50 0 8.395988
Unicoi 47091.25 0 8.077778
Van Buren 41919.00 0 7.148339
Weakley 40819.00 0 6.972840
County Avg. Income Num. Starbucks Avg. Unemployment Rate
Fresno 55523.57 36 12.326804
Glenn 53148.00 0 11.998625
Kern 49580.61 39 11.487972
Madera 49355.76 5 11.890378
Merced 49976.11 7 13.486942
Modoc 43340.33 0 10.394502
Monterey 68913.00 1 9.826804
Plumas 64924.83 0 11.346392
San Joaquin 71281.69 24 10.629553
Santa Cruz 82669.83 7 11.217886
Shasta 53369.18 10 9.644330
Stanislaus 48900.86 7 11.919244
Tehama 48808.75 1 9.508935
Tulare 45336.69 17 13.573540
Yuba 55171.50 0 12.325086
Adams 65864.50 0 6.569547
Appanoose 49003.50 0 5.678395
Benton 57531.00 0 6.154527
Buchanan 63804.00 0 7.088080
Butler 61836.00 0 5.995722
Calhoun 43333.00 0 7.597668
Carroll 55611.00 0 5.938277
Cherokee 55162.00 0 6.286043
Clarke 43619.50 0 7.021777
Clay 46965.00 0 6.565042
Clayton 54832.50 0 5.518210
Clinton 59001.00 0 6.144743
Crawford 51775.50 0 6.761908
Dallas 78098.67 0 7.271241
Delaware 55472.17 0 5.291101
Fayette 55887.25 0 6.797021
Franklin 50078.00 0 5.943186
Greene 58107.83 0 6.827746
Grundy 64870.50 0 6.497376
Hardin 58296.00 0 6.925958
Henry 51271.00 0 6.525775
Jackson 47535.00 0 6.540742
Jasper 59793.11 1 6.459512
Jefferson 56939.00 0 6.596809
Linn 73450.25 9 7.200540
Lyon 54713.00 0 5.473609
Marion 47388.00 0 7.014973
Marshall 60222.00 0 6.233670
Osceola 49948.00 0 5.801299
Pocahontas 50181.50 0 6.703086
Union 43619.50 0 6.206117
Warren 60983.96 7 5.924982
Washington 49947.33 0 5.938779
Webster 54712.17 0 6.443786
Chittenden 83283.50 0 3.387654
Brunswick 35873.00 0 7.469872
Cumberland 46083.00 0 6.302316
Grayson 32076.00 0 7.063253
Greensville 33192.67 0 5.819667
Halifax 29411.00 0 8.662019
Lee 31678.50 0 7.173886
Northampton 39718.25 0 6.739979
Prince Edward 54858.00 0 6.603667
Scott 46810.00 0 5.713129
Smyth 39226.00 0 8.147333
Carter 41428.75 0 6.840901
Chester 24221.00 0 7.115124
Fentress 39775.00 0 9.083333
Hardeman 26503.00 0 7.497459
Lawrence 45854.50 0 7.462405
Lewis 41536.00 0 8.101558
Macon 40533.00 0 7.106605
Monroe 48462.00 0 6.983337
Obion 45863.00 0 7.635802
Rhea 38409.50 0 8.395988
Unicoi 47091.25 0 8.077778
Van Buren 41919.00 0 7.148339
Weakley 40819.00 0 6.972840
## # A tibble: 73 x 4
##    County         `Avg. Income` `Num. Starbucks` `Avg. Unemployment Rate`
##    <chr>                  <dbl>            <dbl>                    <dbl>
##  1 "Fresno "             55524.               36                    12.3 
##  2 "Glenn "              53148                 0                    12.0 
##  3 "Kern "               49581.               39                    11.5 
##  4 "Madera "             49356.                5                    11.9 
##  5 "Merced "             49976.                7                    13.5 
##  6 "Modoc "              43340.                0                    10.4 
##  7 "Monterey "           68913                 1                     9.83
##  8 "Plumas "             64925.                0                    11.3 
##  9 "San Joaquin "        71282.               24                    10.6 
## 10 "Santa Cruz "         82670.                7                    11.2 
## # … with 63 more rows

Limitations and Conclusion

Paragraph here

Citations

List of important packages/data sets